The HOLJ Corpus: Supporting Summarisation Of Legal Texts

نویسندگان

  • Claire Grover
  • Ben Hachey
  • Ian Hughson
چکیده

We describe an XML-encoded corpus of texts in the legal domain which was gathered for an automatic summarisation project. We describe two distinct layers of annotation: manual annotation of the rhetorical status of sentences and an entirely automatic annotation process incorporating a host of individual linguistic processors. The manual rhetorical status annotation has been developed as training and testing material for a summarisation system based on the work of Teufel and Moens, while the automatic layer of annotation encodes linguistic information as features for a machine learning approach to rhetorical status classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Rhetorical Status Classifier For Legal Text Summarisation

We describe a classifier which determines the rhetorical status of sentences in texts from a corpus of judgments of the UK House of Lords. Our summarisation system is based on the work of Teufel and Moens where sentences are classified for rhetorical status to aid sentence selection. We experiment with a variety of linguistic features with results comparable to Teufel and Moens, thereby demonst...

متن کامل

"Why do you Ignore me?" - Proof that not all Direct Speech is Bad

In the automatic summarisation of written texts, direct speech is usually deemed unsuitable for inclusion in important sentences. This is due to the fact that humans do not usually include such quotations when they create summaries. In this paper, we argue that despite generally negative attitudes, direct speech can be useful for summarisation and ignoring it can result in the omission of impor...

متن کامل

Term-based Identification of Sentences for Text Summarisation

The present paper describes a methodology for automatic text summarisation of Greek texts which combines terminology extraction and sentence spotting. Since generating abstracts has proven a hard NLP task of questionable effectiveness, the paper focuses on the production of a special kind of abstracts, called extracts: sets of sentences taken from the original text. These sentences are selected...

متن کامل

Sentence Classification Experiments for Legal Text Summarisation

We describe experiments in building a classifier which determines the rhetorical status of sentences. The research is part of a text summarisation project for the legal domain and we use a newly compiled and annotated corpus of judgments of the UK House of Lords. Rhetorical role classification is an initial step which provides input to the sentence selection component of the system. We report r...

متن کامل

Building Corpora for the Philological Study of Swiss Legal Texts

We describe the construction of two corpora in the domain of Swiss legal texts: The DS21 corpus is based on the Collection of Swiss Law Sources and contains historical legal texts from the early Middle Ages up to 1798; the Swiss Legislation Corpus (SLC) is based on the Classified Compilation of Swiss Federal Legislation and contains all current Swiss federal laws. The paper summarizes the key p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004